Classifying the Hungarian Web
نویسندگان
چکیده
In this paper we present some lessons learned from building vizsla, the keyword search and topic classification system used on the largest Hungarian portal, [origo.hu]. Based on a simple statistical language model, and the largescale supporting evidence from vizsla, we argue that in topic classification only positive ev-
منابع مشابه
Exploring Relevance as Truth Criterion on the Web and Classifying Claims in Belief Levels
The Web has become the most important information source for most of us. Unfortunately, there is no guarantee for the correctness of information on the Web. Moreover, different websites often provide conflicting information on a subject. Several truth discovery methods have been proposed for various scenarios, and they have been successfully applied in diverse application domains. In this paper...
متن کاملA Technique for Improving Web Mining using Enhanced Genetic Algorithm
World Wide Web is growing at a very fast pace and makes a lot of information available to the public. Search engines used conventional methods to retrieve information on the Web; however, the search results of these engines are still able to be refined and their accuracy is not high enough. One of the methods for web mining is evolutionary algorithms which search according to the user interests...
متن کاملDependency Parsing for Identifying Hungarian Light Verb Constructions
Light verb constructions (LVCs) are verb and noun combinations in which the verb has lost its meaning to some degree and the noun is used in one of its original senses. They often share their syntactic pattern with other constructions (e.g. verbobject pairs) thus LVC detection can be viewed as classifying certain syntactic patterns as light verb constructions or not. In this paper, we explore a...
متن کاملAccurate Comparison of Binary Executables
As the volume of malware inexorably rises, comparison of binary code is of increasing importance to security analysts as a method of automatically classifying new malware samples; purportedly new examples of malware are frequently a simple evolution of existing code, whose differences stem only from a need to avoid detection. This paper presents a polynomial algorithm for calculating the differ...
متن کاملMorphological annotation of Old and Middle Hungarian corpora
In our paper, we present a computational morphology for Old and Middle Hungarian used in two research projects that aim at creating morphologically annotated corpora of Old and Middle Hungarian. In addition, we present the web-based disambiguation tool used in the semi-automatic disambiguation of the annotations and the structured corpus query tool that has a unique but very useful feature of m...
متن کامل